-
Notifications
You must be signed in to change notification settings - Fork 62
feat: added client side metric instrumentation to basic rpcs #1188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Mattie Fu <mattiefu@google.com>
| for i in range(num_retryable): | ||
| attempt = handler.completed_attempts[i] | ||
| assert isinstance(attempt, CompletedAttemptMetric) | ||
| assert attempt.end_status.name == "ABORTED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the metrics, but ABORTED shouldn't be retried for mutate row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like it, and it's not listed as a retryable error in the service config. Is it in Java? It would be easy to add here
| assert attempt.end_status.value[0] == 0 | ||
| assert attempt.backoff_before_attempt_ns == 0 | ||
| assert ( | ||
| attempt.gfe_latency_ns > 0 and attempt.gfe_latency_ns < attempt.duration_ns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is gfe_latency_ns injected to the header? this should be a number instead of a range since we can set it in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is testing against the true backend response here, not a mocked value
These are system tests, so I was trying to limit the amount of mocking used here, although some other tests do inject fake exceptions into the stream to test retry logic
| final_attempt = handler.completed_attempts[num_retryable] | ||
| assert isinstance(final_attempt, CompletedAttemptMetric) | ||
| assert final_attempt.end_status.name == "PERMISSION_DENIED" | ||
| assert final_attempt.gfe_latency_ns is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as long as the request gets to the server, gfe_latency_ns should not be none. So this is probably related to the test setup ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this test is testing the case where we have multiple transient retryable errors, before encountering a terminal error. For the errors, I'm using a custom error_injector class, to control which errors are triggered in which sequence. So this test wouldn't ever be reaching the real backend
It's been a while since I looked at this, but I remember having issues fully controlling the backend errors in the way I needed for some of these tests, which is why I used the error_injector. There are other tests in this file that send unauthorized requests to trigger a real backend failure though.
This PR builds off of #1187 to add instrumentation to basic data client rpcs (check_and_mutate, read_modify_write, sample_row_keys, mutate_row)
Metrics are not currently being exported anywhere, just collected and dropped. A future PR will add a GCP exporter to the system